Expressing DOACROSS Loop Dependences in OpenMP

نویسندگان

  • Jun Shirako
  • Priya Unnikrishnan
  • Sanjay Chatterjee
  • Kelvin Li
  • Vivek Sarkar
چکیده

OpenMP is a widely used programming standard for a broad range of parallel systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations within a parallel region. However, certain classes of computations, such as stencil algorithms, can be supported with better synchronization efficiency and data locality when using doacross parallelism with pointto-point synchronization than wavefront parallelism with barrier synchronization. In this paper, we propose new synchronization constructs to enable doacross parallelism in the context of the OpenMP programming model. Experimental results on a 32-core IBM Power7 system using four benchmark programs show performance improvements of the proposed doacross approach over OpenMP barriers by factors of 1.4× to 5.2× when using all 32 cores.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Data Communication Overhead for Doacross Loop Nests Reducing Data Communication Overhead for Doacross Loop Nests

If the loop iterations of a loop nest cannot be partitioned into independent sets, the data communication for data dependences are inevitable in order to execute them on parallel machines. This kind of loop nests are referred to as Doacross loop nests. This paper is concerned with compiler algorithms for parallelizing Doacross loop nests for distributed-memory multicomputers. We present a metho...

متن کامل

An Empirical Study on DOACROSS Loops

Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In this study, we selected 6 large application programs and used an execution-driven simulation technique from MaxPar 5] to identify and to measure the eeectiveness of concurrent DOACROSS loops execution. It was found that executing DOACROSS loops serial...

متن کامل

On Effective Execution of Nonuniform DOACROSS Loops

It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependen...

متن کامل

Redundant Synchronization Elimination for DOACROSS Loops

Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. The problem of redundant synchronization elimination in DOACROSS loops is investigated and an efficient algorithm that identifies redundant synchronizations in multiply-neste...

متن کامل

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013